On the Optimality of the Greedy Heuristic in Wavelet Synopses for Range Queries
نویسندگان
چکیده
In recent years wavelet based synopses were shown to be effective for approximate queries in database systems. The simplest wavelet synopses are constructed by computing the Haar transform over a vector consisting of either the raw-data or the prefix-sums of the data, and using a greedy-heuristic to select the wavelet coefficients that are kept in the synopsis. The greedy-heuristic is known to be optimal for point queries w.r.t. the mean-squared-error, but no similar optimality result was known for range-sum queries, for which the effectiveness of such synopses was only shown experimentally. The optimality of the greedy-heuristic for the case of point queries is due to the Haar basis being orthonormal for this case, which allows using the Parseval-based thresholding. Thus, the main technical question we are concerned with in this paper is whether the Haar basis is orthonormal for the case of range-sum queries. We show that it is not orthogonal for the case of range-sum queries over the raw data, and that it is orthonormal for the case of prefix-sums. Consequently, we show that a slight variation of the greedy-heuristic over the prefix-sums of the data is an optimal thresholding w.r.t. the mean-squared-error. As a result, we obtain the first linear time construction of a provably optimal wavelet synopsis for range-sum queries. The crux of our proof is based on a novel construction of inner products, that define the error measured over range-sum queries.
منابع مشابه
Probabilistic Wavelet Synopses for Multiple Measures
The recently proposed idea of probabilistic wavelet synopses has enabled their use as a tool for reducing large amounts of data down to compact wavelet synopses that can be used to obtain fast, accurate approximate answers to user queries, while at the same time providing guarantees on the accuracy of individual answers. Relatively little attention, however, has been paid to the problem of usin...
متن کاملWorkload-Based Wavelet Synopses
This paper introduces workload-based wavelet synopses, which exploit query workload information to significantly boost accuracy in approximate query processing. We show that wavelet synopses can adapt effectively to workload information, and that they have significant advantages over previous approaches. An important aspect of our approach is optimizing synopses constructions toward error metri...
متن کاملA hybrid metaheuristic using fuzzy greedy search operator for combinatorial optimization with specific reference to the travelling salesman problem
We describe a hybrid meta-heuristic algorithm for combinatorial optimization problems with a specific reference to the travelling salesman problem (TSP). The method is a combination of a genetic algorithm (GA) and greedy randomized adaptive search procedure (GRASP). A new adaptive fuzzy a greedy search operator is developed for this hybrid method. Computational experiments using a wide range of...
متن کاملStructure and Value Synopses for XML Data Graphs
All existing proposals for querying XML (e.g., XQuery) rely on a pattern-specification language that allows (1) path navigation and branching through the label structure of the XML data graph, and (2) predicates on the values of specific path/branch nodes, in order to reach the desired data elements. Optimizing such queries depends crucially on the existence of concise synopsis structures that ...
متن کاملOptimal Workload-Based Weighted Wavelet Synopses
In recent years wavelets were shown to be effective data synopses. We are concerned with the problem of finding efficiently wavelet synopses for massive data sets, in situations where information about query workload is available. We present linear time, I/O optimal algorithms for building optimal workload-based wavelet synopses for point queries. The synopses are based on a novel construction ...
متن کامل